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ABSTRACT 

Variances  for  adaptive  estimators  of  the  location  parameter  in  a 
family  of  symmetric  distributions  including  the  uniform,  normal,  and 
double  exponential  are  examined  at  small  to  moderate  sample  sizes.  The 
estimators  are  all  trimmed  means  or  means  of  trimmings  where  the  propor¬ 
tion  of  trimming  is  determined  by  an  easily  computed  measure  of  nonnor¬ 
mality.  Comparisons  are  made  to  the  asymptotic  variances. 


Variances  for  Adaptive  Trimmed  Means 


John  E.  Boyer,  Jr. 
Southern  Methodist  University 


1.  INTRODUCTION 

In  a  recent  article  Prescott  (1978)  discussed  the  use  of  adaptive 
trimmed  means  and  means  of  trimmings  for  estimating  a  location  parameter 
from  a  symmetric  family  of  distributions.  The  proportion  of  the  sample 
trimmed  or  retained  is  determined  by  the  value  of  the  quantity  Q,  a  mea¬ 
sure  of  the  length  of  the  tails  of  the  distribution  based  on  the  means  of 
groups  of  observations  from  the  extremes  of  the  ordered  sample. 

Asymptotic  properties  based  on  the  corresponding  population  quan¬ 
tity,  Q,  were  derived  for  several  different  such  estimates  under  the  as¬ 
sumption  that  the  underlying  distribution  belongs  to  the  exponential  power 
family  of  distributions.  Since  the  population  quantity  will  not,  in  prac¬ 
tice,  be  available,  the  corresponding  properties  are  examined  in  the  study 
below  and  compared  with  the  values  found  in  Prescott’s  computations. 


2.  Asymptotic  variances  for  Trimmed  Means 


Let  x,  <  x_  <•••<  x  be  an  ordered  sample  of  size  n  from  a  popula- 
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tion  distribution  function  F(x)  and  density  function  f(x).  The  a-trimmed 
mean  is  defined  as 


m(a) 


l  (  n- [no ] -1 

n(1’2a)  (i=[L]+2  Xi 


+  a+tnol-na)(x[na]+i+x„_tna]) 


I 


(2.1) 


The  mean  of  the  observations  discarded  in  m(a)  is  the  a-mean  of  the  trimmings, 


d 


Q 

denoted  m  (a)  and  is  given  by 
c  1  jtna] 

m  (a)  =  - —  \  7  (x.  +x  . .,)  +  (nct-[na])(x.  +  x  .  ,) 

2na  1  ^  1  n-i+1  [na]+l  n-[na] 

It  should  be  noted  that  the  limiting  forms  for  these  estimators  are  commonly 

Q 

encountered  estimators,  i.e.,  m(0.5)  is  the  median,  m(0)  =  m  (0.5)  is  the 

Q 

mean  and  m  (0)  is  the  midrange. 

Prescott  (1978)  considers  these  estimators  for  the  location  parameter 

0  in  the  exponential  power  family  of  symmetric  distributions  defined  by  the 

density  function 

1  —  I  y—  A  I  ^ 

f(x)  =  ^  e1  1  -“  <  x  <  “  ,  t  ^  1.  (2.3) 

2r(— •) 

T 

2 

These  distributions  are  symmetric  about  0  with  variance  =  F (3/t)/F(1/t) . 

If  we  regard  y  ■  ^  as  a  continuous  parameter  in  the  interval  [0,1]  ,  this 
family  may  be  thought  of  as  containing  distributions  which  change  gradually 
from  the  uniform  (y=0) ,  through  short-tailed  symmetric  distributions  to  the 
normal  (y  =  y) ,  then  through  long-tailed  symmetric  distributions  to  the 
double  exponential  (y=l) . 

Prescott  discusses  the  robustness  properties  and  derives  the  asymp- 

Q 

totic  variances  for  m(a)  and  m  (a)  for  distributions  belonging  to  this  family 
by  using  influence  curve  techniques.  As  all  of  the  above  estimators  are 
unbiased  for  0  in  all  of  the  distributions  belonging  to  the  exponential 
power  family,  the  asymptotic  variance  of  the  particular  estimator,  when 
compared  to  the  Cramer-Rao  lower  bound,  provides  a  measure  of  the  efficiency 
of  the  estimation. 

As  different  estimators  from  the  family  of  trimmed  means  and  means- 
of-trimmings  are  more  efficient  depending  on  which  member  of  the  exponential 
family  is  being  considered,  adaptive  estimation  techniques  enter  in  a  natural 
way.  In  particular,  several  statistics  which  choose  a  trimming  proportion  a 
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based  on  the  measure  of  nonnormality  (or  tailweight) 

2  =(Vo5)  "  £(.05))/(5(.05)  "  **(.5<»)  (2*4) 

proposed  by  Hogg  (1974),  where  U...  (£,,.)  is  the  average  of  the  largest 

ip)  (p) 

(smallest)  nS  order  statistics,  with  fractional  items  used  if  n8  is  not 
an  integer,  are  presented  as  possible  adaptive  estimators  for  the  expo¬ 
nential  power  family.  The  choice  of  Q  over  other  measures  of  nonnormality 
or  tailweight  such  as  sample  kurtosis  is  discussed  in  detail  in  Hogg  (1972, 
1974)  and  Davenport  (1971)  and  the  choice  of  the  particular  5%  and  50%  pro¬ 
portions  and  some  asymptotic  properties  for  Q  are  discussed  there. 

Prescott's  computations  of  the  variances  of  the  above  suggested 
trimmed  means  are,  however,  based  on  knowledge  of  Q,  the  population  quantity 
which  corresponds  to  Q.  Nevertheless,  it  is  clear  that  in  this  situation, 

Q  would  not  be  known  to  the  statistician  or  there  would  be  no  need  to  adapt 
in  the  first  place. 

Parr  (1979)  points  out  that  the  variance  of  an  adaptive  estimator  will 
be  dependent  on  the  relative  frequency  with  which  the  adapting  statistic,  Q 
picks  different  trimming  proportions.  That  is,  the  appropriate  variances 
to  consider  for  the  various  adaptive  estimators  are  the  weighted  sums  of  the 
conditional  variances,  where  the  weights  are  the  proportion  of  observations 
that  yield  the  corresponding  values  of  Q,  and  the  conditioning  is  on  the  ob¬ 
served  Q.  Since  Prescott's  estimates  of  the  variances  use  only  the  trimmed 
mean  or  mean-of-the  trimmings  which  appears  to  do  best  or  "nearly"  best  for  a 
given  member  of  the  exponential  power  family,  they  do  not  truly  reflect  the 
adaptive  nature  of  the  statistics.  The  results  of  taking  that  aspect  of  the 
statistic  into  account  will  be  seen  in  the  simulation  discussed  below. 

Analytically,  if  T  is  the  trimmed  mean  chosen  by  the  adaptive  proce¬ 
dure,  then  Var  T  *  E(Var[T|Q])  +  Var{E[T|Q]}.  However,  since  all  the  trimmed 
means  are  symmetric  estimators  and  all  of  the  members  of  the  exponential 


family  are  symmetric  distributions,  E[t|q]  =9  and  thus  the  second  term  will 


always  be  zero.  If  is  chosen  to  (nearly)  minimize  Var [T^] ,  then 
Var[T|Q]  _>  VarCT^l,  so  Var  T=  (var  [t|q)  ]  >_Var  [T^] .  Thus  the  asymptotic 
variances  given  by  the  influence  curve  calculations  will  in  general  be 
smaller  than  the  variances  which  can  be  achieved  in  practice,  and  the 
difference  can  be  attributed  to  the  problem  of  estimating  Q. 


3.  Adaptive  Trimmed  Means 

The  adaptive  estimators  investigated  in  the  simulation  study  discussed 
below  are  the  same  as  those  presented  in  Prescott.  The  first  is  an  estimator 
suggested  by  Hogg  (1974)  given  by 


c.l, 
m  (— ) 
4 


Q  <  2.0 


(3.1) 


m(0)  2.0  <_  Q  <  2.6 

m< 3/16)  2.6  <2  <  3.2 

m( 3/8)  3.2  <  Q  . 

The  second  is  an  estimator  suggested  by  Prescott  (and  denoted  by  T*  in  that 
article) 

Q  <  2.2 

2.2  _<  Q  <  2.4 

2.4  <2  <  2.8  (3.2) 

2.8  <  Q  <_  3.0 
3.0  <  Q  . 

The  third,  also  suggested  by  Prescott,  and  denoted  byT**  there,  is  an  extension 
of  the  notion  that  more  intervals  for  Q  would  produce  asymptotic  variances 
closer  to  the  minima  suggested  by  the  Cramer-Rao  bounds.  Thus  the  statistic 
below  adapts  or  adjusts  continuously  rather  than  in  a  step-wise  fashion. 


mC(.2) 
mc  ( .  3) 


m(0) 
m( .  2) 
m(.  3) 


J. 
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In  (0) ^midrange  Q  <  1.9 

mC  [ (Q-1.9)  *0.7)]  1.9  <  Q  <  2.6 

(3.3) 

m[(Q-2.6)  «0.7]  2.6  <  Q  <_  3.3 

m( . 5) imedian  3.3  <  Q  . 

The  simulation  consisted  of  2000  repetitions  for  each  of  the  three 
statistics  suggested  above,  at  each  of  the  nine  parameter  values  Y  =  0(i)l 
and  at  each  of  the  three  sample  sizes  n=10,  20,  and  40.  The  results  of 
the  simulation  as  well  as  the  asymptotic  variances  provided  by  the  influence 
curve  calculations  and  the  values  of  the  Cramer- Rao  lower  bound  appear  in 
Table  1.  In  addition  figures  A,  B  and  C  are  graphs  of  the  corresponding 
data  for  the  estimators  T  ,  T^,  and  T3  respectively,  using  diagrams  as  in 
Prescott's  work. 

Q 

4.  Properties  of  the  Variances  of  m(a)  and  m  (a) 

Since  the  adaptive  estimator  chosen  is  dependent  on  the  value  Q, 
estimation  of  Q  is  of  interest  in  its  own  right.  Davenport  has  shown 
that  for  the  uniform,  normal  and  double  exponential  distributions,  Q  is 
asymptotically  distributed  as  a  normal  random  variable  with  mean  Q  and  finite 
variance.  In  the  appendix  to  this  paper,  the  condition  on  the  distribution 
required  for  that  result  is  shown  to  hold  for  any  member  of  the  exponential 
power  family. 

As  seen  in  Figure  D,  the  tendency  is  for  Q  to  underestimate  Q  at 
small  to  moderate  sample  sizes.  The  results  graphed  there  are  the  true 
values  of  Q  minus  the  averages  of  Q  obtained  from  2000  samples  of  the  given 
sample  size.  Additionally,  it  is  clear  that  the  magnitude  of  the  underesti¬ 
mation  increases  as  Q  or  {I/t}  increases,  for  any  of  the  sample  sizes  studied. 
This  underestimation  is  a  substantial  factor  in  the  variance  calculations. 
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As  an  example  of  the  point  illustrated  in  Section  3,  consider  the 
estimator  T2  at  1/t  =  5/8  with  samples  of  size  20.  For  this  parameter  con¬ 
figuration  Q  is  2.77  so  the  asymptotic  results  are  based  on  the  sample  mean, 
which  has  (standardized)  variance  1.0.  In  2000  samples,  however  Q  was  less 
than  2.4  715  times,  between  2.4  and  2.8  699  times,  and  larger  than  2.8  586 
times.  Thus  the  sample  mean  is  chosen  only  about  35%  of  the  time,  in 
practice.  The  observed  variance  of  1.1222  reflects  the  fact  that  the  65% 
of  the  time  when  estimators  other  than  m(0)  are  chosen  the  variance  will 
be  larger  than  that  of  m(0) ,  which  is  nearly  best  among  trimmed  means  and 
means  of  trimmings. 

On  the  other  hand  when  the  adaptive  estimator  does  not  choose  a  nearly 
optimal  trimming  proportion,  the  finite  sample  results  may  give  variances 
which  are  smaller  than  those  suggested  by  the  asymptotic  variances. 

For  example,  for  the  2000  samples  of  size  10  at  1/x  =  1/8,  the  value 
of  Q  was  less  than  2.0  1358  times,  between  2.0  and  2.6  572  times,  between 
2.6  and  3.2  66  times  and  larger  than  3.2  4  times.  Thus,  if  T^  is  the  esti¬ 
mator  being  considered,  even  though  the  population  value  of  Q  is  2.05  and 
the  sample  mean,  m(0),  is  the  estimator  that  should  be  chosen,  more  than 
67%  of  the  sample  runs  choose  m  (1/4) .  The  variance  for  this  estimator 

could  therefore  also  be  expected  to  be  quite  different  from  that  of  the 

c 

sample  mean.  In  fact,  m  (1/4)  with  asymptotic  variance  .5869,  is  a  good 
deal  closer  to  the  best  trirrsned  mean  or  mean  of  trimmings  available  for 
1/t  =  1/8  than  the  sample  mean.  Thus  the  estimator  T  ,  for  samples  of 
size  10  actually  performs  considerably  better  than  the  asymptotic  variance 
would  suggest.  This  is  reflected  in  Figure  A,  and  Table  1  where  the 
variance  for  samples  of  size  10  is  seen  to  be  .8216,  while  the  asyit$>totic 
value  is  1.0.  Note  that  the  best  available  estimator  at  y *  1/8  is  a  mean 
of  trimmings  with  a  between  .05  and  .10,  with  an  asymptotic  variance  of 


t 


approximately  .40.  It  is  seen  that  in  this  particular  case,  the  fact 


that  Q  underestimates  Q  is  advantageous. 

The  estimator  exhibits  the  properties  claimed  by  Prescott  in 
his  concluding  section.  Particularly  for  small  samples  (n  <  20)  and 
long-tail  (y  1/2)  distributions,  Q  severely  underestimates  the  para¬ 
meter  Q.  Consequently,  the  trimming  proportion  chosen  is  considerably 
smaller  than  the  optimal  one,  and  the  resulting  variance  is  considerably 
larger  than  that  of  the  optimal  trimmed  mean.  The  performance  is  some¬ 
what  better  for  shorter- tailed  distributions,  but  still  not  particularly 
impressive. 

As  the  sample  size  increases  however,  the  continuous  adaptation 
begins  to  fare  very  well.  By  n  =  40,  the  performance  of  T^  is  as  good 
as  any  of  the  estimators  studied.  This  coincides  with  the  assertions 
made  by  Prescott. 


TABLE  I 


VARIANCES  OF  ESTIMATORS  Tj ,  Tj,  Tj,  BASED  ON  2000  SAMPLES 


i/T 

Q 

.000 

1.90 

•  125 

2.05 

.250 

2.20 

.375 

2.40 

.500 

2.58 

.675 

2.77 

.750 

2.95 

.875 

3.13 

.1000 

3.30 

n  -  10 

.7692 

.8216 

1.0124 

1.0805 

1.1140 

1.0608 

1.0344 

1.0242 

.9729 

o 

i 

e 

.  7564 

.8670 

1.0163 

1.0973 

1.0592 

1.0253 

.9388 

.8597 

.7330 

1  n  -  40 

.6710 

.8079 

1.0428 

1.0580 

1.0301 

.9660 

.9302 

.7895 

.6607 

asympt 

.5000 

1 . 0000 

1 . 0000 

1.0000 

1.0000 

.9389 

.8655 

.7552 

.5478 

n  -  10 

.6327 

.7140 

.9471 

1.0970 

1.2435 

1.2459 

1.2915 

1.3711 

1.3200 

T  n  -  20 

.5524 

.6763 

.0891 

1.0750 

1.1254 

1.1222 

1.0657 

1.0014 

.8285 

n  -  40 

.4570 

.5773 

.8632 

1.0123 

1.0719 

1.0126 

.9871 

.8506 

.6809 

ftsympt 

.4000 

.5186 

.7966 

1 . 0000 

1.0000 

1 . 0000 

.8632 

.7071 

.  S844 

n  -  10 

.5682 

.7014 

1.0168 

1.1937 

1.3902 

1.3593 

1.4454 

1.5070 

1.4  302 

T3  n  *  ” 

.4049 

•  660S 

.9474 

1.1645 

1.2328 

1.1912 

1.1152 

1.0206 

.8595 

n  »  40 

.2374 

.5850 

.9262 

1.0734 

1.1206 

1.0497 

1 . 0060 

.8530 

.6902 

aaytnpt 

.0000 

.4259 

.7566 

.9468 

1.0013 

.9397 

.8575 

.6883 

.5000 
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Lo war  Bound 
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.3924 
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.9485 

.8225 

.6626 

.5000 

Figure  C.  ASYMPTOTIC  VARIANCE  OF 


mC{0)=  midrange  Q 

mC  [  (Q-1.9)  0. 7]  1.9  <_  Q 

T3  "  m[  (Q-2.6)  0. 7]  2.6  <_  Q 

a  (-j)  =  median  3.3  <  Q 


1.4 


n  =* 

10 

O 

n  * 

20 

+ 

n  * 

40 

□ 

1.2  A 


5.  Concluding  Remarks 


Investigations  using  small  to  moderate  sample  size  indicate  that 
adapting  too  closely  may  not  be  worth  the  effort.  Q  will  give  only  a 
general  idea  of  the  value  of  Q,  and  at  small  sample  sizes  gives  a  value 
that  is  too  small.  In  fact,  in  a  slightly  different  setting  (a  two-sample 
problem)  Randles  and  Hogg  (1973)  propose  deciding  that  the  underlying 
distributions  are  light,  medium,  or  heavy  tailed  as 

Q  <  2.08  -  2/N, 

2.08  -  2/N  <_  Q  <  2.96  -  5.5/N, 

2.96  -  5.5/N  <  Q, 

2  2 

respectively,  where  the  samples  are  of  size  m  and  n  and  N  =  (m  +  n  )/m+n. 
In  a  different  discussion,  Hogg  (1974)  recommends  an  adapting  scheme  that 
uses  only  one  trimmed  mean  if  the  sample  size  is  less  than  or  equal  to  10, 
one  of  two  adaptively  chosen  trimmed  means  if  10  <  n  <_  20,  one  of  three 
adaptively  chosen  trimmed  means  if  20  <  n  <_  30,  etc. ,  thus  adapting  more 
closely  as  the  sample  size  increases.  Both  of  these  suggestions  and  the 
current  study  support  Prescott's  conclusion  that  if  the  sample  size  is 
fairly  large  (n  >_  50)  the  continuously  adapting  T3  should  be  a  useful  robust 
estimator,  but  if  n  <  50,  and  particularly  if  one  suspects  long-tailed  non¬ 
normality,  then  T2  might  be  preferable. 

6.  APPENDIX 

In  section  4,  reference  was  made  to  the  requirement  on  a  distribution 
in  order  that  Q  might  be  asymptotically  normal.  In  Davenport  (1971,  p.  8) , 
four  conditions  on  a  distribution  are  described  as  sufficient  to  ensure 
that  asymptotic  property.  The  fourth  condition,  essentially  a  requirement 
on  smoothness  of  the  tails  of  the  distribution,  comes  directly  from  Chernoff 
(1967,  p.  61,  Assumption  5) .  The  Chernoff  condition  can  be  easily  shown  to 
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be  satisfied  under  the  assumption  that  the  distribution  is  a  member  of  the 
exponential  power  family  of  distributions  (with  the  exception  of  t*°°,  the 
uniform  distribution,  for  which  the  result  can  be  shown  directly)  by  way  of 
the  following  theorem  which  establishes  the  relationship  between  the  density 
and  the  cumulative  distribution  function  for  all  members  of  the  family. 
Theorem.  Let  f(x)  =  2'f~(l+l/T)  exp  (- 1  x  |T),  1  <_  t  <  “,  the  density  for  a 
member  of  the  exponential  power  family,  and  let  F(*)  be  the  corresponding 
cumulative  distribution  function.  For  any  x  >  0 

T  XT  (1-F  (x)  )  <_  f(x)  £  <T  XT  1  +(T-1)1/X)  (l-F(x)  )  . 

Proof:  The  derivative  of  f(x)  is  -txT  ^ffx)  and  the  derivative  of  l-F(x) 
is  -f(x)  so  that 


f(x)  =  /  ryT  ^(yjdy 


=  -TyT_1  (1-F  (y) )  |  +  T(T-l)  /  y  (1-F(y))dy 

x 

where  the  second  equality  follows  from  integration  by  parts.  The  second  term  in 
the  latter  expression  is  positive  and  the  first  term  in  that  expression  is 


t  xT  1  (l-F(x)).  Thus  f(x)  >  txT  1(1-F(x))  as  desired. 


1  1-T 


Now  replace  1  -  F(y)  in  the  second  term  by  its  upper  bound  —  y  f (y) 


just  obtained  and  the  result  is 


f(x)  £  r  x1-1  (1-F (x) )  +  t(t-I)  /  yT  2  ^  y1  T  f (y)dy 


txT  1  (l-F(x) )  +  (t-1)  /  ^  f  (y)dy 

x  y 


<  T  X  (l-F (x) )  +  (T-l) 


x  ]  f (y)dy 
x 


( 1-F (x) )  (T  xT_1  +(T-l)^  )  • 
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Combining  gives 

t  xT-1  (1-F(x))<  f(x)  <  (t  xT_1  +(T-l)i)  (l-F(x))  Q.E.D. 

Note  that  the  inequalities  are  strict  for  r  >  1  and  when  t  =  1,  the  equality 
1  -  F(x)  =  f(x)  holds  for  all  x  >  0.  An  equivalent  result  holds  for  x  <_  0 
and  the  two  conditions  together  provide  all  the  necessary  machinery  to 
satisfy  the  Chernoff  requirements. 
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